In Figure 21-1, observe that each line ends with a code, and there’s a legend at the bottom. Six of the
ten participants (#’s 1, 2, 4, 6, 9, and 10, labeled X) died during the course of the follow-up study.
Two participants (#5 and #7, labeled L) were LFU at some point during the study, and two participants
(#3 and #8, labeled E) were still alive at the end of the study. So this study has four participants — the
Ls and the Es — with censored survival times.
So, how do you analyze survival data containing censoring? The following sections explain the correct
ways to proceed as well as mistakes to avoid.
Analyzing censored data properly
Statisticians have developed techniques to utilize the partial information contained in censored
observations. We describe two of the most popular techniques later in this chapter, which are the
life-table method and the Kaplan-Meier (K-M) method. To understand these methods, you need to
first understand two fundamental concepts — hazard and survival:
The hazard rate is the probability of the participant dying in the next small interval of time,
assuming the participant is alive right now.
The survival rate is the probability of the participant living for a certain amount of time after
some starting time point.
The first task when analyzing survival data is usually to describe how the hazard and survival rates
vary with time. In this chapter, we show you how to estimate the hazard and survival rates, summarize
them as tables, and display them as graphs. Most of the larger statistical packages (such as those
described in Chapter 4) allow you to do the calculations we describe automatically, so you may never
have to do them manually. But without first understanding how these methods work, it’s almost
impossible to understand any other aspect of survival analysis, so we provide a demonstration for
instructional purposes.
Making mistakes with censored data
Here are two mistakes you need to avoid when working with survival data:
You shouldn’t exclude participants with a censored survival time from any survival analysis!
You shouldn’t substitute the censored date with some other value, which is called imputing. When
you impute numerical data to replace a missing value, it is common to use the last observed value
for that participant (called last observation carried forward, or LOCF, imputation). However, you
should not impute dates in survival analysis.
Exclusion and imputation don’t work to fix the missingness in censored data. You can see why in
Figure 21-2, where we’ve slid the timelines for all the participants over to the left as if they all had
their surgery on the same date. The time scale shows survival time in years after surgery instead of
chronological time.